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SUMMARY 

This  paper  overviews  contemporary  issues  in  incorporating  data  quality 
statements  into  spatial  databases.  The  paper  includes  discussion  of  two 
approaches;  one  emanating  from  the  Digital  Chart  of  the  World  Project  and 
one  through  a  working  party  within  the  International  Cartographic 
Association. 
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1  INTRODUCTION 


Digital  processing  of  spatial  data  brings  immense  benefits  in  the  form  of  rapid,  precise  and 
sophisticated  analysis,  but  reveals  weaknesses  which  may  not  otherwise  be  apparent. 
Computers  are  very  precise  machines,  and  errors  and  uncertainties  in  data  can  lead  to  serious 
problems,  not  only  in  the  form  of  inaccurate  results  but  in  the  consequences  of  decisiorw  made  on 
the  basis  of  poor  data.  Capabilities  that  excite  enthusiasm  among  potential  users  are  the 
ability  to  change  scale  and  the  ability  to  overlay  different  themes  of  inforntation  at  random. 
These  capabilities  are  indeed  exceedingly  useful;  they  constitute  much  of  the  comparative 
advantage  geographic  information  system  technology  (commonly  referred  to  as  CIS)  holds  over 
spatial  analysis  based  on  analog  maps  (Goodchiid,  1991;  Abler,  1987). 

These  capabilities,  however,  can  also  mislead  decision  makers  who  are  unaware  of  the 
imprecision  inherent  in  all  cartography  and  who  are  untutored  in  the  ways  errors  compound 
when  map  scales  are  changed  or  when  maps  are  merged.  Burrough  (1986)  observes  ”a  false  lure 
in  the  attractive,  high  quality  cartographic  products  that  cartographers,  and  now  computer 
graphics  specialists,  provide  for  their  colleagues  in  environmental  survey  and  resource 
analysis. ...  Many  scientists  and  geographers  know  from  field  experience  that  carefully  drawn 
boundaries  and  contour  lines  on  maps  are  elegant  misrepresentations  of  changes  that  are  often 
gradual,  vague  or  fuzzy  ". 

Goodchiid  (1991)  warns  that  "if  the  burgeoning  CIS  industry  is  indeed  driven  by  false 
perceptions  of  data  accuracy,  then  the  truth  will  be  devastating:  even  the  simplest  products 
will  be  suspect  The  best  insurance  at  this  point  is  surely  to  sensitise  the  GIS  user  to  the  accuracy 
issue,  and  to  develop  tools  which  allow  spatial  data  handling  systems  to  be  used  in  ways 
which  are  sensitive  to  error”.  That  is,  systems  that  use  digital  geographic  information  require 
a  method  to  maintain  and  manage  their  contents  and  processes  over  the  long  term. 

Up  until  just  a  few  years  ago,  the  description  of  data  quality  and  associated  issues  have  been 
neglected  topics.  Fortunately,  however,  the  topic  is  now  being  recognised  as  one  of  importance 
and  the  issue  of  the  description  of  data  quality  is  being  addressed  by  a  number  of  research 
organisations  and  professional  bodies  throughout  the  world.  The  catalyst  for  this  work  is 
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because  of  incomplete  coverage,  variable  accuracy,  inconsistencies  in  standards  and  inadequate 
sources. 

Two  approaches  are  worthy  of  assessment.  One  approach  emanates  from  the  Digital  Chart  of 
the  World  (IXIW)  project  while  the  other  emanates  from  the  Scientific  Advisory  Board  of  the 
International  Cartographic  Association. 

1.1  Digital  Chart  of  the  World  (DCW) 

The  DCW  Project  is  a  United  States  Defense  Mapping  Agency  research  and 
development  effort  (to  which  Australia,  via  the  Royal  Australian  Survey  Corps,  is  a 
cooperative  partner),  whose  ultimate  objective  is  the  promulgation  of  standards  for  the 
exchange  of  digital  spatial  irdormation  and  the  development  and  distribution  of  a 
global  topographic  database  on  compact  disk  (CD-ROM)  (DMA,  1991). 

DCW  will  be  a  new  product  of  the  Defense  Mapping  Agency  (DMA).  It  will  provide 
worldwide  coverage  using  a  topologically  based  vector  data  structure  to  digitally 
represent  the  earth's  land  surface  information  on  a  micrncomputer  accessible  storage 
media.  The  1:1000000  scale  Operational  Navigation  Chart  (ONC)  series  will  provide 
the  majority  of  the  irdormation  to  produce  the  DCW.  The  Jet  Navigation  Chart  (JNC) 
series  will  provide  the  information  over  the  Antarctica.  Features  will  be  collected  and 
stored  along  with  their  attributes  at  the  level  of  detail  provided  on  the  ONCs. 

The  purpose  of  the  project  is  twofold: 

•  To  develop,  refine,  and  establish  a  suite  of  standards  that  enable  the 
exchange  and  utility  of  spatial  information;  and 

•  To  perform  the  necessary  research  and  development  steps  to  produce  the 
DCW  in  compliance  with  these  standards. 

In  order  to  insure  the  suite  of  standards  will  be  compatible  with  the  international 
community,  as  well  as  the  US  Department  of  Defense;  allied  parmers,  namely  Canada, 
United  Kingdom  and  Australia,  are  participants  in  the  overall  research  and 
development. 

Standards  to  be  developed  for  the  DCW  include  format  standards,  media  standards,  a 
DCW  product  specification,  and  a  data  directory  standard  to  include  tiling,  coverage 
index,  thematic  index,  gazetteer  index,  and  spatial  query  index. 
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The  data  structure  used  for  DCW  is  a  topologically  structured  vector  structure  in  a 
relational  model  and  is  known  as  Vector  Product  Format  (VPF).  VPF  also  contains  dau 
quality  information  so  that  users  may  evaluate  the  utility  of  the  data  for  a  particular 
application. 

1.2  International  Cartographic  Association  initiative 

Technological  issues  such  as  those  concerned  with  digital  data  quality  are  also 
receiving  attention  from  working  groups  within  professional  organisations.  Perhaps  the 
lead  professional  body  in  the  disciplinary  area  concerned  with  spatial  data  is  the 
International  Cartographic  Association  (ICA).  The  ICA  has  within  its  organisational 
structure  a  number  of  commissions  and  working  groups  whose  terms  of  reference,  amongst 
other  things,  includes  "undertaking  efforts  on  critical  topics  of  research".  The  Scientific 
Advisory  Board  of  the  International  Cartographic  Association  has  produced  a  set  of 
guidelines  as  its  contribution  to  a  clear  and  consistent  approach  to  the  assessment  of 
data  quality.  These  are  presented  in  Section  3  ICA  Data  Quality  Proposal. 


2  VPF  DATA  QUALITY  STATEMENT 

The  data  structure  used  for  DCW  is  a  topologically  structured  vector  structure  in  a  relational 
model  and  is  known  as  Vector  Product  Format  (VPF).  VPF  is  a  generic  geographic  data  model 
designed  to  be  used  with  any  digital  geographical  data  in  vector  format  that  can  be  represented 
using  nodes,  edges,  and  faces.  VPF  is  based  upon  the  georelational  model,  combinatorial 
topology  and  set  theory.  VPF  also  contains  data  quality  information  so  that  users  ntay 
evaluate  the  utility  of  the  data  for  a  particular  application. 

VPF  contains  data  quality  information  at  a  number  of  different  levels  within  the  database 
with  the  detailed  description  being  modified  from  the  Spatial  Data  Quality  section 
(Section  4)  of  NCIXDS  Report  #7  (Moellering,  1986). 

2.1  Data  quality  hierarchy 

The  VPF  model  is  a  hierarchical  one  with  information  held  at  database,  library, 
coverage,  feature  and  primitive  levels.  Data  quality  information  at  the  database  level 
applies  to  ail  libraries  of  the  database,  except  where  those  libraries  contain  their  own 
data  quality  information  of  the  same  kind.  Similarly,  data  quality  information  at  the 
library  level  (which  may  have  been  inherited  from  the  database)  applies  to  all 
coverages  within  the  library,  except  those  that  contain  their  own  data  quality 
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information  of  the  same  sort.  Coverage  level  data  quality  information  applies  in  the 
same  manner  to  features.  Feature  level  data  quality  information  in  turn  likewise 
applies  to  both  spatial  primitives  and  attributes  that  compose  them. 

2.2  Data  quality  encoding 

Data  quality  information  is  represented  as  attributes  or  as  a  coverage.  If  as  attributes, 
it  may  be  either  added  to  an  existing  VPF  table,  or  as  an  independent  table  raiding  at 
the  appropriate  level.  If  a  coverage,  it  shall  be  a  coverage  whose  area  or  complex 
features  designate  areas  with  uniform  data  quality  information  of  specified  types. 
Figure  1  depicts  the  attribute  and  coverage  locations  of  data  quality  information 
through  the  database. 

2.3  Types  of  data  quality  information 

There  are  seven  types  of  data  quality  information: 

•  Source.  Source  describes  the  origin  or  derivation  of  a  single  feature, 
primitive  or  attribute.  This  includes  any  processing  techruques  applied 
to  the  data,  as  well  as  the  data  source. 

•  Positional  accuracy.  Positional  accuracy  provides  an  upper  bound  on  the 
deviation  of  coordinates  in  VPF  from  the  position  of  the  real  world 
entity  being  modelled.  Positional  accuracy  must  be  specified  without 
relation  to  scale  and  shall  contain  all  errors  introduced  by  source 
documents,  data  capture,  and  processing. 

•  Attribute  accuracy.  Attribute  accuracy  describes  the  accuracy  or 
reliability  of  attribute  data. 

•  Currency.  Currency  represents  the  date  at  which  the  data  was 
introduced  or  modified  in  the  database.  This  date  of  entry  is  used  as  a 
proof  of  modification  for  a  single  data  element,  permitting  statistical 
interpretation  of  groups  of  data  elements. 

•  Logical  consistency.  Logical  consistency  describes  the  fidelity  of 
relationships  encoded  in  a  VPF  data  set.  Logical  consistency  requires 
that  all  topological  foreign  keys  match  the  appropriate  primitive, 
that  ail  attribute  foreign  keys  match  the  appropriate  primitives  or 
features,  and  that  ail  tables  described  in  feature  class  scheme  tables  do 
indeed  have  the  relationships  described. 
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•  Feature  completeness.  Feature  completeness  indicates  the  degree  to 
which  all  features  of  a  type  for  the  area  of  the  data  set  have  been 
included. 

•  Attribute  completeness.  Attribute  completeness  indicates  the  degree  to 
which  all  attributes  of  a  feature  have  been  included  for  that  feature. 
Actually,  since  this  information  can  be  derived  from  the  feature  itself, 
simply  by  counting  null  values,  this  particular  form  of  data  quality 
information  should  not  need  to  be  explicitly  included. 

These  types  of  information  above  are  VPF's  standard  types  of  quality  data.  Product 
specifications,  such  as  the  Digital  Chart  of  the  World,  call  for  additional  types  of 
data  quality  information  as  well. 

2.4  DCW  metadata 

The  DCW  is  one  database  with  two  libraries.  The  database  level  includes  three 
tables:  a  database  header  table,  a  database  description  table  and  a  library  description 
table.  The  database  header  table  contains  metadata  pertaining  to  the  DCW  data  and 
includes  information  on  security  and  release  information. 

The  DCW  library  is  a  directory  containing  VPF  tables,  coverages  and  index  tables.  One 
table,  known  as  the  library  header  table,  identifies  the  data  set,  sources,  extent, 
projection,  security,  and  data  quality  information  in  the  library  (Figure  2). 

As  the  Digital  Chart  of  the  World  is  available  for  public  release  from  February  1992, 
the  schema  will  be  the  first  containing  a  data  quality  statement'  that  will  be 
supported  as  a  'standard'.  Therefore,  future  defence  data  (in  vector  format)  should 
include,  as  a  nunimum,  that  formation  as  shown  in  Figure  2^ 

It  seems  unfortunate  that,  although  'data  quality  statements'  have  been  identified  as 
being  important,  the  implementation  in  VPF  (and  therefore  in  DCW)  is  somewhat 
simplistic  and  poorly  described  in  accompanying  documentation.  This  component  of  VPF 
(and  DCW)  is  clearly  one  needing  further  development  and  enhancement 


It  is  apparent  that  the  developers  of  the  'data  quality'  module  of  DCW  lacked  experience  and/or 
knowledge  in  cartography  and  surveying.  In  the  draft  documentation  there  are  errors  and 
uncertainties.  Firstly,  the  projection  is  noted  to  be  'Unprojected'  with  decimal  degrees  but 
horizontal  unit  of  measure  is  given  as  ‘Meters'  (possibly  should  be  exi^essed  in  arc  units). 
Secondly,  the  vertical  unit  is  expressed  as  'Meters’  but  the  source  material  wu  an  aeronautical  chart 
with  elevation  in  'Feet'.  The  absolute  horizontal  accuracy  was  given  at  .*-2040  meters  (perhaps  *-2 
KM  might  have  been  more  commensurate  with  the  source  scale). 
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VPF  coluoui  name 

DCW  column  name 

Record  entry 

. 

IE) 

Product  type 

PRODUCT.TYPE 

DCW 

Name 

LIBRARY_NAME 

DCW 

Data  Structure  Code 

DATA_STRUCT_CODE 

1.  2  and  6 

Series 

SOURCE_SERJES 

ONC 

Source  Identification 

SOURCE_ID 

Complete  ONC  series 

Edition 

SOURCE.EDmON 

Varies  wiib  source  map  sheet 

Source  Name 

SOURCE_NAME 

Operational  Nav  Chans.  Jet  Nav  Charts 

Source  Date 

SOURCE_DATE 

1989 

Ellipsoid  Name 

ELLIPSOID_NAME 

WGS 

Ellipsoid  Code 

ELLIPSOID_CODE 

None 

Vertical  Reference  Name 

VERT_REF_NAME 

Mean  Sea  Level 

Vertical  Reference  Code 

VERT_REF_CODE 

MSL 

Vertical  Datum  Code 

VERT_DATUM_CODE 

.  .iknown 

Geodetic  Datum  Name 

GEOD.DATUM.NAME 

Unknown 

Geodetic  Datum  Code 

GEOD_DATUM_CODE 

Unknown 

Longitude  of  SW  Comer 

LON_SW_MBR 

0  Longitude 

Latitude  of  SW  Comer 

LAT_SW_MBR 

90  South  Latitude 

Longitude  of  NE  Comer 

LON_NE_MBR 

0  Longitude 

Latitude  of  NE  Comer 

LAT_NE_MBR 

90  North  Latitude 

Longitude 

LON_BOUND_FACE 

■i~  180  degrees 

Latitude 

LAT_BOUND_FACE 

+-  90  degrees 

Projection  Name 

PROJECnON.NAME 

Decimal  degrees  (Uiiprojccted) 

Projection  Code 

PROJECnON.CODE 

Unknown 

Security  Classification 

SECURITY.CLASS 

U 

Downgrading 

DOWNGRADING 

No 

Date 

DOWNGRADING.DAT 

N/A 

Releasability 

RELEASABILITY 

Unrestricted 

Feature  Completeness 

FEATURE_COMPLETE 

100%  of  ONC 

Attribute  Completeness 

ATTRIBUTE_COMPL 

100%  of  ONC 

Consistency 

LOGICAL.CONSIST 

TBD 

Edition  Number 

DATASET.ED.NO 

1 

Creatici'i  Date 

CREATION.DATE 

TBD 

Revision  Date 

DATASET.REV.DATc 

TBD 

Recompilation  Date 

RECOMP.COUNT 

0 

Revision  Count 

REVISION.COUNT 

0 

Specification  ID 

PR0DUCT_SPEC_ID 

MIL-D-89009 

Dme 

SPEC_DATE 

April  29, 1991 

Amendment 

SPEC.AMENDMENT 

N/A 

Earliest  Source 

earliest_source 

1971 

Latest  Source 

latest^source 

1989 

Quantitative  Attribute 

QUANT.ATnUBinnE 

Unknown 

Qualitative  Attribute 

QUAL  ATITUBUTE 

TBD 

Collection  Criteria 

COLLECnON.SPEC 

ONC  Spec  and  DCW  Design  Criteria 

Absolute  Horizonaial  Accuracy 

AflS_HORIZ_ACC 

+-  2040  meters 

Unit  of  Measure 

HORE_UNTrS 

Meters 

Absolute  Vertical  Accuracy 

ABS_VERT_ACC 

610  meters 

Unit  of  Measure 

vertical.untts 

Unknown 

Relative  Horizonatl  Accuracy 

PT_PT_HORJZ_ACC 

N/A 

Relative  Vertical  Accuracy 

pt_pt_vert_acc 

N/A 

Comments 

COMMENTS 

Source  map  editions  from  1971  to  1989 

Fig^e  2;  Schema  for  IXTW  Library  Header  Table 
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The  second  initiative  referred  to  earlier,  that  by  the  International  Cartographic 
Association,  offers  an  approach  to  improve  on  the  weakness  in  the  VPF  'data  quality 
statement'. 


3  ICA  DATA  QUALITY  PROPOSAL 

The  International  Cartographic  Association  (ICA)  through  its  Scientific  Advisory  Board,  has 
developed  a  set  of  guidelines  as  its  contribution  to  a  clear  and  consistent  approach  to  the 
assessment  of  data  quality. 

The  guidelines  are  intended  to  satisfy  certain  basic  requirements; 

•  Defensible.  Qualitative  rating  schemes  like  Itigh',  medium'  and  low'  would 
be  difficult  to  defend  because  of  subjectivity,  in  the  form  of  inconsistency 
between  assessors,  and  confusion  over  what  the  terms  mean.  The  guidelines 
emphasise  objective  measurement,  with  summaries  as  simple,  unambiguous 
choices. 

•  Informative.  The  purpose  o(  a  rating  should  be  to  give  the  user  the  greatest 
possible  amount  of  useful  information.  If  ratings  are  to  be  designed  by  a  testing 
scheme,  they  should  be  designed  to  pass  as  many  detailed  results  of  testing  as 
possible  on  to  the  user.  They  should  reflect  likely  uses  by  anticipating  what  the 
user  will  be  doing  with  the  data. 

•  Definitive.  It  is  important  that  the  differences  between  ratings  be  as  definitive 
as  possible,  and  not  based  on  subjective  scales  of  assessment 

Rather  than  attempt  to  assess  quality  in  an  absolute  sense,  the  quidelines  emphasise  the 
quality  of  data  relative  to  user  needs  and  anticipated  uses,  by  comparing  reality  to  likely 
expectations.  In  many  cases  spatial  databases  are  assembled  from  well  known  and  widely 
distributed  sources,  so  an  important  measure  of  quality  is  the  degree  to  which  the  information 
content  of  the  source  has  been  captured  accurately  in  the  database:  this  relative  measure  may 
be  more  useful  to  the  potential  user  than  an  absolute  measure  of  quality. 

The  guidelines  use  certain  terms  which  require  definition: 

•  Reality:  independently  verifiable  ground  truth;  an  item  of  information  that  can 
be  verified  by  visiting  the  appropriate  place  on  the  earth's  surface  and  making 
a  measurement  or  observation; 

•  Source:  the  documents  (often  maps)  from  which  the  database  was  built.  The 
source  is  assumed  to  be  available  for  assessment  of  the  quality  of  the  database: 


8 


UNCLASSIFIED 


UNCUSSIFIED 


£nL-C632-RN 


•  Database:  the  product  being  tested;  a  set  of  digitaJ  records  organised  in  some 
appropriate  structure.  The  assessment  of  quality  extends  not  oniy  to  the  records 
themselves,  but  also  to  information  that  can  be  deduced  from  the  records  by 
simple  processes.  For  example,  a  user  may  wish  to  know  the  accuracy  of  the 
length  of  a  digital  line,  whether  length  is  stored  explicitly  in  the  database  or 
computed  from  the  line's  coordinates; 

•  Source  errors;  inaccuracies  apparent  in  the  source  when  its  contents  are 
compared  to  reality.  These  may  include  the  uncertainties  due  to  different 
interpretations  of  ground  truth; 

•  Processing  errors:  inaccuracies  introduced  by  digital  processing  {including 
digitising)  and  thus  apparent  in  the  database  when  its  contents  are  compared  to 
the  source. 

The  guidelines  describe  two  distinct  approaches,  and  each  has  two  levels:  overall  summary 
rating,  and  detailed  assessment.  In  the  latter  area  sections  of  the  guidelines  have  been  adapted 
and  modified  from  the  Spatial  Data  Quality  section  (Section  3)  of  the  proposed  US  Natioiul 
Spatial  Data  Transfer  Standard  (SDTS).  This  standard  will  be  the  basis  of  the  proposed 
Australian  Spatial  Data  Transfer  Standard  (ASDTS)(Moellering,  1986). 

The  intent  of  the  guidelines  is  they  be  used  to  assemble  an  informative  Data  Quality  Statement 
to  accompany  the  database. 

3.1  Overall  Summary  Rating 

A  summary  rating  is  assessed  using  one  of  two  methods,  depending  on  whether  accuracy 
is  determined  with  respect  to  source  document  or  ground  truth. 

3.1.1  Method  1 

Method  1  is  used  to  assess  databases  with  respect  to  source  documents,  but  also 
must  address  the  quality  of  the  source  document  itself,  usually  by  reference  to 
independent  reports.  A  Method  1  rating  has  two  parts,  e.g.  A1  C,  denoting  a 
database  that  captures  accurately  the  entire  contents  of  a  source  document  of 
unknown  quality.  These  parts  are: 

•  A  measure  of  the  relationship  of  the  digital  database  to  its  source; 

•  A  statement  of  the  quality  of  the  source; 

Refer  to  Figures  3  and  4  for  measure  of  rating  for  Method  1. 
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Figures  Summary  Rating  -  Method  1  -  Relationship  between  database  and  source 
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3.1.2  Method! 

Method  2  assesses  the  quality  of  the  database  by  direct  reference  to  ground 
truth,  and  has  only  one  part:  that  being  a  statement  of  the  relationship  of  the 
database  to  ground  truth.  Refer  to  Figure  5. 

3.2  Detailed  Assessment 

Spatial  databases  frequently  contain  multiple  themes,  often  from  different  sources.  A 
detailed  assessment  of  data  quality  must  address  each  theme  individually, 
particularly  in  comparisons  with  ground  truth.  Detailed  assessment  is  relevant  in  two 
cases; 

•  In  deteriTuning  the  accuracy  of  the  database  in  relation  to  its  source 
(Method  1  above);  and 

•  In  determining  the  accuracy  of  the  daubase  in  relation  to  ground  truth 
(Method  2). 

There  are  significant  differences  in  the  approaches  in  the  two  cases. 

3.2,1  Method  1 

Each  assessment  consists  of  five  sections; 

•  Lineage; 

•  Positional  accuracy; 

•  Attribute  accuracy; 

•  Logical  consistency;  and 

•  Completeness. 

3.2.1.1  Lineage 

The  lineage  portion  of  a  quality  report  includes  a  description  of  the 
source  material  from  which  the  data  were  derived  and  the  methods  of 
derivation,  including  all  transformations  involved  in  producing  the 
final  digital  files.  The  description  should  include  the  dates  of  the 
source  material  and  the  dates  of  ancillary  information  used  for  update. 
The  date  assigned  to  a  source  should  reflect  the  date  that  the 
information  corresponds  to  the  ground:  however,  if  this  date  is  not 
known,  then  a  date  of  publication  may  be  used,  if  declared  as  such. 

Any  database  created  by  merging  information  obtained  from  distinct 
sources  should  be  described  in  sufficient  detail  to  identify  the  actual 
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source  for  each  element.  In  these  cases,  either  a  lineage  code  on  each 
element  or  a  quality  overlay  (source  data  index,  etc)  should  be 
provided. 

The  lineage  report  should  include  information  on  all  coordinate 
transforrrutions  applied  to  the  data,  including  changes  of  projections, 
and  the  parameters  used  in  each  transformation  (e.g.  figures  of  the 
earth). 

3.2.1.2  Positional  accuracy 

Descriptions  of  positional  accuracy  should  consider  the  quality  of  the 
final  product  after  all  transformations.  The  information  on 
transformations  forms  a  part  of  the  lineage  portion  of  the  quality 
report. 

Measures  of  positional  accuracy  may  be  obtained  by  one  of  the  following 
optional  methods; 

•  Deductive  estimate:  an  estimate  of  positional  accuracy 
based  on  knowledge  of  the  errors  introduced  in  each 
production  step.  Any  deductive  statement  should 
describe  the  assumptions  made  concerning  error 
propagation  (e.g.  independence); 

•  Internal  evidence;  an  estimate  based  on  repeated 
measurements,  e.g.  by  having  several  operators  digitise 
the  same  source  material; 

•  Comparison  to  source;  an  estimate  based  on  graphic 
inspection  of  results  and  comparison  with  source  ("check 
plots');  and 

•  Independent  source  of  higher  accuracy:  the  preferred 
test  for  positional  accuracy  is  a  comparison  to  an 
independent  source  of  higher  accuracy.  The  number  of 
test  points  and  sampling  design  should  be  reported. 

3.2.1.3  Attribute  accuracy 

Accuracy  assessment  for  measures  on  a  continuous  scale  (interval /ratio) 
should  be  expressed  in  terms  of  a  numerical  estimate  of  expected 
discrepancies  (standard  or  RMS  error).  Accuracy  for  measures  on  a 
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discrete  scale  (nominal)  should  be  given  as  percent  correct,  which  could 
be  expressed  in  the  form  of  a  misclassification  matrix  with  summary 
statistic  for  the  classified  attributes.  Sampling  design  and  sample  size 
should  be  reported. 

3.2.1.4  Logical  consistency 

A  report  on  logical  consistency  should  describe  the  fidelity  of 
relationships  encoded  in  the  data  structure  of  the  database.  Tests  for 
permissible  values  may  be  applied  to  any  data  structure.  Such  a  test  can 
detect  gross  blunders,  but  does  not  ensure  ail  aspects  of  logical 
consistency.  A  data  base  containing  lines  may  be  subjected  to  genera! 
questions  such  as  Do  lines  intersect  only  where  intended?  Are  any  lines 
entered  twice?  Are  all  areas  completely  described?  Are  there  any 
overshoots  or  undershoots?  Are  any  polygons  too  small,  or  any  lines  too 
close? 

For  exhaustive  areal  coverage  data  transmitted  as  chains  or  derived 
from  chains  (see  the  layer  model  discussion  below),  it  is  permissible  to 
report  logical  consistency  as  'topologically  clean'  under  the  condition 
that  an  automated  procedure  has  verified  the  following  conditions: 

«  All  chains  intersect  at  nodes; 

•  Cycles  of  chains  and  nodes  are  consistent  around 
polygons.  Or,  alternatively,  cycles  of  chains  and 
polygons  are  consistent  around  nodes;  and 

•  Inner  rings  embed  cortsistently  in  enclosing  polygons. 

3.2.1.5  Completeness 

The  quality  report  should  include  information  about  selection  criteria, 
definitioru  used  and  other  relevant  rules  used  to  capture  features  from 
the  source.  For  example,  geometric  thresholds  such  as  a  minimum  area 
or  minimum  width  should  be  reported. 

The  report  on  completeness  should  describe  the  relationship  between 
the  objects  represented  and  the  abstract  universe  of  all  such  objects 
present  in  the  source.  In  particular,  the  report  should  describe  the 
exhaustiveness  of  a  set  of  features. 
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3.2^  Method! 

Two  different  strategies  are  acceptable,  depending  on  the  nature  of  the  theme: 

•  Layers;  and 

•  Objects. 

Each  assessment  consists  of  five  sections: 

•  Lineage; 

•  Positional  accuracy; 

•  Attribute  accuracy; 

•  Logical  consistency;  and 

•  Completeness. 

3.2.2.1  Layers 

The  theme  represents  a  single  variable  with  a  value  everywhere,  e.g.  a 
map  of  soil  class,  land  use,  or  elevation.  The  database  will  likely  be 
expected  to  provide  estimates  of  the  value  of  the  variable  at  specific 
points,  and  the  measure  of  accuracy  should  inform  the  user  of  the 
uncertainty  involved  in  determitting  such  values. 

3.Z2.2  Objects 

The  theme  cortsists  of  a  set  of  well-defined  geographic  features  with 
associated  attributes.  Features  should  be  sufficiently  well-defined  to  be 
identifiable  on  the  ground,  allowing  a  test  of  positional  accuracy  to  be 
made  with  respect  to  ground  truth.  Building  footprints,  shorelines, 
rivers,  mountain  peaks,  bridges  and  roads  are  examples  of  well-defined 
geographic  features.  In  cases  where  the  object  is  highly  interpreted  and 
thus  not  suitable  for  ground  truth  (an  independent  observer  could  not 
reasonably  be  expected  to  identify  correctly  whether  an  arbitrarily 
chosen  point  was  located  inside  the  object  or  not),  accuracy  caimot  be 
evaluated  (e.g.  location  of  object  'The  Top  End'  of  the  Northern 
Territory). 

Accuracy  should  be  assessed  using  the  same  five  categories  identified 
above  (lineage,  positional  and  attribute  accuracy,  logical  consistency 
and  completeness).  For  the  layer  model,  positional  accuracy  should  be 
omitted  as  it  is  not  relevant,  but  attribute  accuracy  is  particularly 
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important,  and  attention  should  also  be  paid  to  the  data  structure 
aspects  o{  logical  consistency.  For  the  object  model  positional  accuracy 
is  particularly  important,  but  the  data  structure  will  likely  impose  few 
logical  consistency  conditions. 

3J2.2.3  Lineage 

The  lineage  portion  includes  a  description  of  the  entire  process  of  data 
handling  from  raw  ground  observations  through  to  the  digital 
database,  including  ail  transformations  involved  in  producing  the  firul 
digital  files.  The  description  should  include  the  dates  of  raw 
observations,  and  the  dates  of  ancillary  information  used  for 
interpretation  or  update. 

Any  database  created  by  merging  information  obtained  from  distinct 
sources  should  be  described  in  sufficient  detail  to  identify  the  actual 
source  for  each  element.  In  these  cases,  either  a  lineage  code  on  each 
element  or  a  quality  overlay  (source  data  index,  etc.)  should  be 
provided. 

The  lineage  report  should  include  information  on  all  coordinate 
transformations  applied  to  the  data,  including  changes  of  projections, 
and  the  parameters  used  in  each  transformation  (e.g.  figures  of  the 
earth). 

3.2,2.4  Positional  accuracy  (object  model  only) 

Descriptions  of  positional  accuracy  should  consider  the  quality  of  the 
final  product  after  all  transformations.  The  information  on 
transformations  forms  a  part  of  the  lineage  portion  of  the  quality 
report. 

Measures  of  positional  accuracy  may  be  obtained  by  one  of  the  following 
optional  methods: 

•  Deductive  estimate:  an  estimate  of  positional  accuracy 
based  on  knowledge  of  the  errors  introduced  in  each 
production  step  from  raw  observations  to  digital 
database.  Any  deductive  statement  should  describe  the 
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assumptions  made  concerning  error  propagation  (eg. 
independence); 

•  Internal  evidence:  an  estimate  based  on  repeated 
measurements,  e  g.  by  having  several  operators  collect 
and  process  the  same  data;  and 

•  Comparison  to  ground  truth:  an  estimate  based  on  actual 
ground  check  of  the  positiorw  of  objects,  e  g.  using  GPS. 

3.2.2.5  Attribute  accuracy 

Accuracy  assessment  for  measures  on  a  continuous  scale  (interval /ratio) 
should  be  expressed  in  terms  of  a  numerical  estimate  of  expected 
discrepancies  (standard  or  RMS  error).  Accuracy  for  measures  on  a 
discrete  scale  (nominal)  should  be  given  as  percent  correct,  which 
should  be  expressed  in  the  form  of  a  misclassification  matrix  with 
summary  statistic  for  classified  attributes.  Sampling  design  and  sample 
size  should  be  reported.  Attribute  accuracy  may  be  assessed  by 
comparison  to  ground  truth,  internal  evidence  or  deductive  estimates. 

3.2.2.6  Logical  consistency 

A  report  on  logical  consistency  should  describe  the  fidelity  of 
relationships  encoded  in  the  data  structure  of  the  database.  Tests  for 
permissible  values  may  be  applied  to  any  data  structure.  Such  a  test  can 
detect  gross  blunders,  but  does  not  ensure  all  aspects  of  logical 
consistency.  A  data  base  containing  lines  may  be  subjected  to  general 
questions  such  as  Do  lines  intersect  only  where  intended?  Are  any  lines 
entered  twice?  Are  all  areas  completely  described?  Are  there  any 
overshoots  or  undershoots?  Are  any  polygons  too  small,  or  any  lines  too 
close? 

For  exhaustive  areal  coverage  data  transmitted  as  chains  or  derived 
from  chairts  (see  the  layer  model  discussion  below),  it  is  permissible  to 
report  logical  consistency  as  topologically  clean'  under  the  condition 
that  an  automated  procedure  has  verified  the  following  conditions; 

•  All  chaiits  intersect  at  nodes; 

•  Cycles  of  chains  and  nodes  are  consistent  around 
polygons.  Or,  alternatively,  cycles  of  chains  and 
polygons  are  consistent  around  nodes;  and 
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•  Inner  rings  embed  consistently  in  enclosing  polygons. 

3.2.2.7  Completeness 

The  quality  report  should  include  information  about  selection  criteria, 
definitions  used  and  other  relevant  rules  used  to  capture  features  from 
the  source.  For  example,  geometric  thresholds  such  as  a  minimum  area 
or  minimum  width  should  be  reported. 

The  report  on  completeness  should  describe  the  relationship  between 
the  objects  represented  and  the  abstract  universe  of  all  such  objects  in 
reality.  In  particular,  the  report  should  describe  the  exhaustiveness  of 
a  set  of  features. 


4  IMPLEMENTATION  STRATEGY 

As  digital  geographic  data  has  not  usually  contained  details  of  data  quality  explicitly  within 
its  structure  or  in  associated  documentation,  there  is  a  requirement  to  formulate  an 
implementation  and  management  strategy  to  incorporate  this  form  of  information.  Such  a 
strategy  needs  to  take  into  consideration  the  diversity  of  forms  and  formats  currently  in 
existence  as  well  as  the  sheer  magnitude  of  the  task  if  fine  detail  is  required  immediately  for 
all  data  assets  (not  only  from  within  Defence  but  also  the  wider  community). 

An  implementation  strategy  is  complex  and  involves  knowledge  of  digital  data  requirements, 
production  and  acquisition  priorities,  and  coordination  through  a  number  of  ADF  organisations. 
It  is  therefore  the  subject  of  another  study.  A  strategy  would  include,  however,  a  number  of 
steps: 

•  Compilation  of  a  register  of  digital  data  assets  of  defence  and  civilian 
agencies; 

•  Assembling  an  overall  summary  rating  of  the  data  sets;  and 

•  Producing  detailed  descriptions  for  the  data  sets. 

Any  implementation  plan,  however,  involves  a  'cost*.  But  such  a  'cost'  should  not  only  be 
considered  in  terms  of  dollars  and  manhours,  it  should  also  be  evaluated  against  benefits  to 
Defence  systems.  As  technology  evolves,  future  weapons  systems,  navigation  systems,  command 
and  control,  targeting,  and  intelligence  systems  will  become  'smarter';  and  the  'smarter'  the 
systems  become  the  mure  reliance  there  will  be  on  the  data  on  which  they  base  their 
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’decisions'.  This  means  that  the  systems  will  require  detailed  knowledge  of  the  quality'  and 
reliability  of  the  data  (similar  to  those  discussed  in  the  ICA  Data  Quality  Proposal). 

In  the  meantime,  there  are  in  excess  of  thirty  separate  projects  (that  need  to  access  digital 
geographic  data  in  one  form  or  another)  being  staffed  in  the  Forces  Executive,  Navy,  Army  and 
Air  Force  acquisition  programs,  ft,  therefore,  seems  appropriate  to  commence  the 
implementation  process  of  applying  data  quality  labels'  to  existing  data  sets  and  those  in 
current  production  eind  to  guidelines  compatible  with  our  Defence  partners.  For  example,  a 
number  of  systems  (such  as  the  F/A-18  Mission  Data  Planning  Facility,  Electronic  Chart 
Display  and  Information  System,  Mine  Warfare  Systems  Centre  Information  System, 
Australian  Army  Tactical  Command  Support  System,  and  Operational  Movements  Planning 
System)  require  digital  feature  data  for  a  range  of  analyses,  and  it  seems  appropriate  to  format 
these  data  and  include  quality  statements’  that  are  being  introduced  as  MILITARY 
STANDARDS  by  other  ABCA  organisations.  As  such,  the  VPF  Data  Quality  Statement  should 
be  used  as  Stage  One  of  an  implementation  strategy. 
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